NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Natural Language Querying on NoSQL Databases: Opportunities and Challenges

https://doi.org/10.1109/BigData62323.2024.10825998

Zhang, Wenlong; Shi, Tian; Wang, Ping (December 2024, IEEE)

Full Text Available
Natural Language Querying on Domain-Specific NoSQL Database with Large Language Models

https://doi.org/10.1109/BIBM62325.2024.10822485

Zhang, Wenlong; He, Chengyang; Yang, Guanqun; Bandyopadhyay, Dipankar; Shi, Tian; Wang, Ping (December 2024, IEEE)

Full Text Available
Task as Context: A Sensemaking Perspective on Annotating Inter-Dependent Event Attributes with Non-Experts

https://doi.org/10.1609/hcomp.v11i1.27550

Li, Tianyi; Wang, Ping; Shi, Tian; Bian, Yali; Esakia, Andy (November 2023, Proceedings of the AAAI Conference on Human Computation and Crowdsourcing)

This paper explores the application of sensemaking theory to support non-expert crowds in intricate data annotation tasks. We investigate the influence of procedural context and data context on the annotation quality of novice crowds, defining procedural context as completing multiple related annotation tasks on the same data point, and data context as annotating multiple data points with semantic relevance. We conducted a controlled experiment involving 140 non-expert crowd workers, who generated 1400 event annotations across various procedural and data context levels. Assessments of annotations demonstrate that high procedural context positively impacts annotation quality, although this effect diminishes with lower data context. Notably, assigning multiple related tasks to novice annotators yields comparable quality to expert annotations, without costing additional time or effort. We discuss the trade-offs associated with procedural and data contexts and draw design implications for engaging non-experts in crowdsourcing complex annotation tasks.
more » « less
Full Text Available
Text-to-ESQ: A Two-Stage Controllable Approach for Efficient Retrieval of Vaccine Adverse Events from NoSQL Database

https://doi.org/10.1145/3584371.3613008

Zhang, Wenlong; Zeng, Kangping; Yang, Xinming; Shi, Tian; Wang, Ping (September 2023, ACM)
Attention-based aspect reasoning for knowledge base question answering on clinical notes

https://doi.org/10.1145/3535508.3545518

Wang, Ping; Shi, Tian; Agarwal, Khushbu; Choudhury, Sutanay; Reddy, Chandan K. (August 2022, BCB '22: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics)

Full Text Available
A Simple and Effective Self-Supervised Contrastive Learning Framework for Aspect Detection

Shi, Tian; Li, Liuqing; Wang, Ping; Reddy, Chandan K (January 2021, Proceedings of the AAAI Conference on Artificial Intelligence)
null (Ed.)
Full Text Available
Text-to-SQL Generation for Question Answering on Electronic Medical Records

Wang, Ping; Shi, Tian; Reddy, Chandan K. (April 2020, Proceedings of The Web Conference (WWW))

Electronic medical records (EMR) contain comprehensive patient information and are typically stored in a relational database with multiple tables. Effective and efficient patient information retrieval from EMR data is a challenging task for medical experts. Question-to-SQL generation methods tackle this problem by first predicting the SQL query for a given question about a database, and then, executing the query on the database. However, most of the existing approaches have not been adapted to the healthcare domain due to a lack of healthcare Question-to-SQL dataset for learning models specific to this domain. In addition, wide use of the abbreviation of terminologies and possible typos in questions introduce additional challenges for accurately generating the corresponding SQL queries. In this paper, we tackle these challenges by developing a deep learning based TRanslate-Edit Model for Question-to-SQL (TREQS) generation, which adapts the widely used sequence-to-sequence model to directly generate the SQL query for a given question, and further performs the required edits using an attentive-copying mechanism and task-specific look-up tables. Based on the widely used publicly available electronic medical database, we create a new large-scale Question-SQL pair dataset, named MIMICSQL, in order to perform the Question-to-SQL generation task in healthcare domain. An extensive set of experiments are conducted to evaluate the performance of our proposed model on MIMICSQL. Both quantitative and qualitative experimental results indicate the flexibility and efficiency of our proposed method in predicting condition values and its robustness to random questions with abbreviations and typos.
more » « less
Full Text Available
Deep Reinforcement Learning for Sequence-to-Sequence Models

https://doi.org/10.1109/TNNLS.2019.2929141

Keneshloo, Yaser; Shi, Tian; Ramakrishnan, Naren; Reddy, Chandan K. (July 2020, IEEE Transactions on Neural Networks and Learning Systems)

In recent times, sequence-to-sequence (seq2seq) models have gained a lot of popularity and provide stateof-the-art performance in a wide variety of tasks, such as machine translation, headline generation, text summarization, speech-to-text conversion, and image caption generation. The underlying framework for all these models is usually a deep neural network comprising an encoder and a decoder. Although simple encoder–decoder models produce competitive results, many researchers have proposed additional improvements over these seq2seq models, e.g., using an attention-based model over the input, pointer-generation models, and self-attention models. However, such seq2seq models suffer from two common problems: 1) exposure bias and 2) inconsistency between train/test measurement. Recently, a completely novel point of view has emerged in addressing these two problems in seq2seq models, leveraging methods from reinforcement learning (RL). In this survey, we consider seq2seq problems from the RL point of view and provide a formulation combining the power of RL methods in decision-making with seq2seq models that enable remembering long-term memories. We present some of the most recent frameworks that combine the concepts from RL and deep neural networks. Our work aims to provide insights into some of the problems that inherently arise with current approaches and how we can address them with better RL models. We also provide the source code for implementing most of the RL models discussed in this paper to support the complex task of abstractive text summarization and provide some targeted experiments for these RL models, both in terms of performance and training time.
more » « less
Full Text Available
Tensor-based Temporal Multi-Task Survival Analysis

https://doi.org/10.1109/TKDE.2020.2967700

Wang, Ping; Shi, Tian; Reddy, Chandan K. (January 2020, IEEE Transactions on Knowledge and Data Engineering)

Survival analysis aims at predicting time to event of interest along with its probability on longitudinal data. It is commonly used to make predictions for a single specific event of interest at a given time point. However, predicting the occurrence of multiple events simultaneously and dynamically is needed in many applications. An intuitive way to solve this problem is to simply apply the regular survival analysis method independently to each task at each time point. However, it often leads to a suboptimal solution since the underlying dependencies between tasks are ignored, which motivates us to analyze these tasks jointly to select common features shared across all tasks. In this paper, we formulate a temporal Multi-Task learning framework (MTMT) using tensor representation. More specifically, given a survival dataset and a sequence of time points, which are considered as the monitored time points, we model each task at each time point as a regular survival analysis problem and optimize them simultaneously. We demonstrate the performance of MTMT model on two real-world datasets. We show the superior performance of the MTMT model compared to several state-of-the-art models. We also provide the list of important features selected to demonstrate the interpretability of our model.
more » « less
Full Text Available
LeafNATS: An Open-Source Toolkit and Live Demo System for Neural Abstractive Text Summarization

https://doi.org/10.18653/v1/N19-4012

Shi, Tian; Wang, Ping; Reddy, Chandan K. (June 2019, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations))

Neural abstractive text summarization (NATS) has received a lot of attention in the past few years from both industry and academia. In this paper, we introduce an open-source toolkit, namely LeafNATS, for training and evaluation of different sequence-to-sequence based models for the NATS task, and for deploying the pre-trained models to real-world applications. The toolkit is modularized and extensible in addition to maintaining competitive performance in the NATS task. A live news blogging system has also been implemented to demonstrate how these models can aid blog/news editors by providing them suggestions of headlines and summaries of their articles.
more » « less
Full Text Available

« Prev Next »

Search for: All records